Skip to content

Increase MCP gateway health check timeout from 120s to 240s#9901

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/fix-daily-news-workflow-timeout
Closed

Increase MCP gateway health check timeout from 120s to 240s#9901
Copilot wants to merge 2 commits intomainfrom
copilot/fix-daily-news-workflow-timeout

Conversation

Copy link
Contributor

Copilot AI commented Jan 14, 2026

Daily News workflow failing 50% of runs since Jan 9 with exit code 7 (timeout) during MCP gateway startup. Gateway initialization requires starting multiple Docker containers (GitHub MCP server, safe-outputs) with API authentication, which exceeds the hardcoded 120s timeout during peak times or network latency.

Changes

  • actions/setup/sh/start_mcp_gateway.sh: Timeout 120s → 240s, configurable via MCP_GATEWAY_HEALTH_TIMEOUT_SECONDS
  • actions/setup/sh/verify_mcp_gateway_health.sh: Same timeout increase and configurability
# Default: 240s
HEALTH_TIMEOUT_SECONDS="${MCP_GATEWAY_HEALTH_TIMEOUT_SECONDS:-240}"

# Override if needed:
env:
  MCP_GATEWAY_HEALTH_TIMEOUT_SECONDS: 300

Impact

  • No recompilation needed (scripts are runtime-copied, not embedded)
  • Backward compatible (all workflows benefit immediately)
  • Workflow-level timeout remains 30 minutes (unchanged)
Original prompt

This section details on the original issue you should resolve

<issue_title>[P1] Daily News Workflow Timeout Failures - 50% Success Rate</issue_title>
<issue_description># 🚨 High Priority: Daily News Workflow Timeout Failures

Summary

The Daily News workflow has degraded to 50% success rate (10/20 runs) with consistent timeout failures starting January 9, 2026. Users are not receiving daily repository news updates reliably.

Error Details

Sample Failed Run

Recent Run History

Run githubnext/gh-aw#101 (2026-01-13): failure
Run githubnext/gh-aw#100 (2026-01-12): failure
Run githubnext/gh-aw#99  (2026-01-09): failure
Run githubnext/gh-aw#98  (2026-01-08): success ✓
Run githubnext/gh-aw#97  (2026-01-07): success ✓
Run githubnext/gh-aw#96  (2026-01-06): success ✓
Run githubnext/gh-aw#95  (2026-01-05): success ✓

Suspected Root Causes

  1. Network/API Latency: Increased response times from external services
  2. Rate Limiting: GitHub API or external news sources throttling requests
  3. Resource Contention: Runner experiencing performance issues
  4. Timeout Configuration: 120s limit may be insufficient for peak times

Part of Systemic Pattern

  • Similar timeout pattern seen in CI Doctor workflow (0% success)
  • Both workflows started failing around same time (2026-01-09)
  • Both showing exit code 7 timeout errors
  • Suggests system-wide issue, not workflow-specific

Recommended Actions

Immediate (P1)

  1. Analyze Slow Operations

    • Identify which operations exceed timeout
    • Profile workflow execution time
    • Check for external API dependencies
  2. Review Timeout Configuration

    • Consider increasing timeout limit
    • Add timeout parameters to individual steps
    • Implement better retry logic
  3. Optimize Performance

    • Cache frequently accessed data
    • Parallelize independent operations
    • Reduce API call frequency if possible

Follow-up

  1. Add better logging/observability
  2. Implement timeout monitoring/alerting
  3. Create fallback mechanism for news aggregation
  4. Document performance baselines

Impact Assessment

  • User Impact: Inconsistent daily updates
  • Frequency: Daily scheduled workflow
  • Severity: High - affects user experience
  • Pattern: Part of larger timeout epidemic

Related Issues

  • CI Doctor workflow (P0) - same timeout pattern
  • Systemic timeout investigation needed
  • May affect other scheduled workflows

Detection

Identified by Workflow Health Manager on 2026-01-14
Health Score Impact: -5 points (75/100 overall)


Labels: workflow-health, priority-p1, type-failure, timeout
Related: CI Doctor timeout issue, systemic performance investigation

AI generated by Workflow Health Manager - Meta-Orchestrator

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix timeout failures in daily news workflow Increase MCP gateway health check timeout from 120s to 240s Jan 14, 2026
Copilot AI requested a review from mnkiefer January 14, 2026 03:41
@pelikhan pelikhan closed this Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P1] Daily News Workflow Timeout Failures - 50% Success Rate

3 participants